Crash Course in Statistical Models
Normal Regression
\[y_i = \beta_0 + \sum_j \beta_j x_j + \epsilon_j, \epsilon_j \text{ is } \sim N(0,\sigma^2)\]
- Equivalently \(y_i\) is \(\sim N(\beta_0 + \sum_j \beta_j x_j, \sigma^2)\)
- Estimation
- Least-squares: minimize squared residuals
- Maximum likelihood by normal pdf \[L_i = \frac{1}{\sigma}\phi\left(\frac{y_i-\beta_0-\sum_j \beta_j x_j}{\sigma}\right)\]
- Regression to mean is deterministic for expected value of most regression model expressed as f(covariates)
- Probabilistic application through simulation
Functions of Random Variables
- From probability theory courses …. transformation of random error term to random variable via Jacobian transformation
Log-Normal Regression
\[\ln(y_i) = \beta_0 + \sum_j \beta_j x_j + \epsilon_j, \epsilon_j \text{ is } \sim N(0,\sigma^2)\]
Probit Regression
\[y_i^* = \beta_0 + \sum_j \beta_j x_j + \epsilon_j, \epsilon_j \text{ is } \sim N(0,\sigma^2)\]
Observed value is binary (0/1) \[Pr(y_i=1) = \pi_i = \Phi \left((\beta_0 + \sum_j \beta_j x_j)/\sigma \right)\] \[Pr(y_i=0) = 1 - \pi_i = 1 - \Phi \left((\beta_0 + \sum_j \beta_j x_j)/\sigma \right) = \Phi \left( - (\beta_0 + \sum_j \beta_j x_j)/\sigma \right)\]
Estimation by Maximum likelihood by normal cdf \[L_i = \pi_i^{\delta_i}(1-\pi_i)^{1-\delta_i} \text{ }\delta_i=1 \text{, if }y_i=1, 0 \text{ otherwise}\]
- Variance normalized to 1 (homogeneous) or parameterized without a constant (heterogeneous)
Logistic Regression
- Observed value is binary (0/1) with \(\epsilon_i\) being logistically distributed this time \[Pr(y_i=1) = \pi_i = \frac{\exp (\beta_0 + \sum_j \beta_j x_j)}{1 + \exp(\beta_0 + \sum_j \beta_j x_j)}\] \[Pr(y_i=0) = 1 - \pi_i = 1 - \pi_i = \frac{1}{1 + \exp(\beta_0 + \sum_j \beta_j x_j)}\]
- Estimation by Maximum likelihood \[L_i = \pi_i^{\delta_i}(1-\pi_i)^{1-\delta_i} \text{ }\delta_i=1 \text{, if }y_i=1, 0 \text{ otherwise}\]
- Interpretation (scale compounded parameters): \[logit(\pi_i) = \text{log of odds ratio} = \ln\left(\frac{\pi_i}{1-\pi_i}\right)=\beta_0 + \sum_j \beta_j x_j\]
Probit/Logistic Regression
- Binary outcome model can be used to model market shares in aggregate demand models
- Pseudo-likelihood function \[L_i = \pi_i^{y_i}(1-\pi_i)^{1-y_i}\]
- \(i\) is the aggregation unit (e.g., zones, groups, etc.)
- \(y_i\) is the share (proportion) of \(y\) in unit \(i\)
Ordered Regression
- We often have ordinal measures for which we cannot assume that the categories are equally spaced (if they were then we likely would use linear regression model)
- Questionnaire items for opinions
- Data that were originally measured at the interval/ratio level then grouped (lumped) into ordered categories (age, income)
- Qualitative measures that are not truly continuous
- Ordered regression is a useful tool for above situations
Ordered Regression
- An important criteria for the ordered probability model is that the results remain consistent regardless of how the dependent variable is cut into categories
- This means, if a new category is added to an existing variable, the variable’s coefficients should remain the same regardless of the number of categories in the dependent variable
Ordered Regression
- Consider the latent variable model \[y_i^* = \beta x_i + \epsilon_i\] \[y_i = j \text{ if } \tau_{j-1} \leq y_i^* \leq \tau_j, j \in \{0,1,2\dots J\}\]
- We have \(J\) levels and \((J-1)\) cut points represented by \(\tau\). End categories are 0 and \(J\) for which values of cut points will be \(-\infty\) and \(+\infty\), respectively.
- For example, if \(J=4\) \[y_i = 1 \text{ if } -\infty < y_i^* < \tau_1\] \[y_i = 2 \text{ if } \tau_1 \leq y_i^* < \tau_2\] \[y_i = 3 \text{ if } \tau_2 \leq y_i^* < \tau_3\] \[y_i = 4 \text{ if } \tau_3 \leq y_i^* < +\infty\]
Ordered Regression
- Now, if we replace the latent variable by corresponding utility functions \[Pr(y_i=j|x_i) = Pr(\tau_{j-1} \leq \beta x_i + \epsilon < \tau_j)\] \[Pr(y_i=j|x_i) = Pr(\tau_{j-1} - \beta x_i \leq \epsilon < \tau_j - \beta x_i)\] \[Pr(y_i=j|x_i) = \int_{-\infty}^{\tau_j - \beta x_i} f(\epsilon_i) d\epsilon_i - \int_{-\infty}^{\tau_{j-1} - \beta x_i} f(\epsilon_i) d\epsilon_i\] \[Pr(y_i=j|x_i) = F(\tau_j - \beta x_i) - F(\tau_{j-1} - \beta x_i)\]
Ordered Regression
- If we assume \(\epsilon_i\) follows a normal distribution with zero mean and \(\sigma^2\) variance, then
- Ordered Probit Regression \[Pr(y_i=j|_i = \Phi((\tau_j - \beta x_i)/\sigma) - \Phi((\tau_{j-1} - \beta x_i)/\sigma) \]
- Typically, set \(\sigma=1\)
- If we assume \(\epsilon_i\) follows a Type I Extreme Value distribution with scale \(\mu\)
- Ordered Logit Regression \[Pr(y_i=j|x_i) = \frac{\exp(\mu(\tau_j - \beta x_i))}{1 + \exp(\mu(\tau_j - \beta x_i))} - \frac{\exp(\mu(\tau_{j-1} - \beta x_i))}{1 + \exp(\mu(\tau_{j-1} - \beta x_i))}\]
- Typically assume \(\mu=1\)
Ordered Regression
- Ordered Probit model \[Pr(y_i=1|x_i) = \Phi(\tau_1 - \beta x_i) - 0\] \[Pr(y_i=2|x_i) = \Phi(\tau_2 - \beta x_i) - \Phi(\tau_1 - \beta x_i)\] \[Pr(y_i=3|x_i) = \Phi(\tau_3 - \beta x_i) - \Phi(\tau_2 - \beta x_i)\] \[Pr(y_i=4|x_i) = 1 - \Phi(\tau_3 - \beta x_i)\]
- Ordered Logit model \[Pr(y_i=1|x_i) = \exp(\tau_1 - \beta x_i)/(1 + \exp(\tau_1 - \beta x_i)) - 0\] \[Pr(y_i=2|x_i) = \exp(\tau_2 - \beta x_i)/(1 + \exp(\tau_2 - \beta x_i)) - \exp(\tau_1 - \beta x_i)/(1 + \exp(\tau_1 - \beta x_i))\] \[Pr(y_i=3|x_i) = \exp(\tau_3 - \beta x_i)/(1 + \exp(\tau_3 - \beta x_i)) - \exp(\tau_2 - \beta x_i)/(1 + \exp(\tau_2 - \beta x_i))\] \[Pr(y_i=3|x_i) = 1 - \exp(\tau_3 - \beta x_i)/(1 + \exp(\tau_3 - \beta x_i))\]
Ordered Regression
- Ordered Probit vs Ordered Logit model: Parameters of Probit and Logit are not comparable
- Normal error with unit variance in an Ordered Probit: variance of ordered regression, \(\sigma^1=1\)
- Type I Extreme Value has a variance, \(\sigma^2=\pi^2/6\mu^2\). So, with a unit scale (\(\mu=1\)) in an Ordered Logit, the variance \(\sigma^2=\pi^2/6\)
Ordered Regression
- Estimation: likelihood function \[L(y|x,\beta,\theta = \prod_{i=1}^N\prod_{j=1}^J \left(F(\tau_j - \beta x_i) - F(\tau_{j-1} - \beta x_i) \right)^{z_{ij}}\] \[z_{ij}=1 \text{ if } y_i = j\] \[z_{ij}=0 \text{ otherwise }\]
- Useful to think of \(\tau_j\) as intercepts
- Standard practice is to think of intercept as baseline probability when \(x=0\)
Zero-Inflated Ordered Regression
- Observed data have overwhelming portion with zero values: zero-inflation \[y_i = 0 \text{, no further regression model}\] \[y_i > 0 \text{, } Pr(y_i=j|x)\]
- Estimation: Maximum likelihood \[y_i=0 -> \eta_i \leq \sum_k \gamma z_k\] \[y_i > 0, -> \eta_i > \sum_k \gamma_k z_k \text{ & } Pr(y_i=j|x)\] \[L_i = \Phi(\sum_k \gamma_k z_k)^{y_i=0} \times \left(\Phi(-\sum_k \gamma_k z_k)(F(\tau_j - \beta x_i) - F(\tau_{j-1} - \beta x_i))\right)^{y_i=j}\]
- 2 models: model of binary zero/non-zero \(y\) & ordered regression model
Count Variable Regression
- Count (\(y_i\)) of events occuring randomly & uniformly in tmie with constant expected rate of occurence \[Pr(y_i) = \frac{\lambda^{y_i}e^{-\lambda}}{y_i!} \text{, } E[y_i] = V[y_i] = \lambda \text{, } \lambda = exp(\sum_j \beta_j x_j)\]
- Likelihood function for estimation \[L(y_i|x,\beta) = Pr(y_i)\]
- Issues with Poisson model
- Heterogeneity can violate constant expected rate assumption
- Upper limit of count data can exist - can cause violation of \(E[y_i] = V[y_i]\) assumption and cause over/under dispersion
Poisson Regression with Heterogeneity
- Existence of over-dispersion requires correction
- Assume additional gamma heterogeneity \[\theta^* = exp(\sum_j \beta_j x_j + u) = exp(\sum_j \beta_j x_j) \times exp(u)\]
- Consider a positive distribution for \(u\), a Gamma distribution with mean 1 and variance \(1/\theta=\alpha\)
- Results in a Negative Binomial regression model \[Pr(y_i) = \frac{\Gamma(\theta + y_i)}{\Gamma(y_i +1)\Gamma(\theta)}r_i^{y_i}(1-r_i)^{\theta} \text{, } r_i = \frac{exp(\sum_j \beta_j x_j)}{exp(\sum_j \beta_j x_j) + \theta}\]
- Likelihood function for estimation: \(L(y_i|x,\beta,\theta) = Pr(y_i)\), \(\theta\) is the dispersion parameter, which should be a positive value
Zero-Inflated Count Regression
- Observed data have overwhelming portion with zero values: zero-inflation \[y_i = 0 \text{, no further regression model}\] \[y_i > 0 \text{, } Pr(y_i=j|x) \text{Poisson or Negative Binomial}\]
- Estimation: Maximum likelihood \[y_i=0 -> \eta_i \leq \sum_k \gamma z_k\] \[y_i > 0, -> \eta_i > \sum_k \gamma_k z_k \text{ & } Pr(y_i=j|x)\] \[L_i = \Phi(\sum_k \gamma_k z_k)^{y_i=0} \times \left(\Phi(-\sum_k \gamma_k z_k)Pr(y_i)\right)^{y_i=j}\]
- 2 models: model of binary zero/non-zero \(y\) & count regression model
Use of Econometric Models
- Parameter of explanatory variable makes little sense unless put into context
- Econometric models should be used for:
- Prediction of the dependent variable and/or probability of different values of independent variables
- Meaningful interpretation of estimated model parameters
- Marginal effects (ME) \[ME = \frac{\partial y_i}{\partial x_j} \text{, } ME=\frac{\partial Pr(y_i/U_i/V_i)}{\partial x_j}\]
- Elasticity (E) \[E = \frac{\partial y_i}{\partial x_j}\frac{x_j}{y_j} \text{, } E=\frac{\partial Pr(y_i/U_i/V_i)}{\partial x_j}\frac{x_j}{Pr(y_i/U_i/V_i)}\]
Practical Significance
- Marginal effects and elasticities can be challenging to interpret
- Practical significance (PS) is an approach to provide a clear illustration of the significance of a variable – sometimes termed effect size
- Measured as:
- Continuous variables: % change in \(Pr( 𝑉_𝑖)\) for a 1 standard deviation (SD) change in \(𝑥_𝑗\)
- Ex. % change in \(Pr( 𝑉_𝑖)\) for a 1 SD change in income
- Discrete variables: % change in \(Pr( 𝑉_𝑖)\) for presence of variable
- Ex. % change in \(Pr( 𝑉_𝑖)\) given a person is male